Multi-species Occupancy Models

Ecological Forecasting Initiative

Christopher Rota

4/4/2022

Acknowledgments

Multi-species Occupancy Models (MSOM)

What can we Learn from MSOMs?

Different Flavors of MSOMs

Today’s webinar is focused on techniques described by Rota et al. 2016.

Technical Details: Natural Parameters

Technical Details: Natural Parameters

Let \(S\) represent the number of species

Example: let \(S = 3\):

I only incorporate \(1^{st}\) and \(2^{nd}\) order natural parameters for this talk.

Technical Details: Marginal Occupancy

Technical Details: Conditional Occupancy

Basic Sampling Procedure

Same general procedure for single-species occupancy models:

Impelementation in unmarked

Let’s first review our example data set and then format for analysis in unmarked.

Data

Data

Let’s read in and examine our data.

Load detection / non-detection data:

bob <- read.csv('data/Bobcat_3wk.csv', header = F)
coy <- read.csv('data/Coyote_3wk.csv', header = F)
fox <- read.csv('data/RedFox_3wk.csv', header = F)

Inspect red fox detection / non-detection data:

head(fox)
##   V1 V2 V3
## 1  0  0  0
## 2  0  0  0
## 3  0  0  1
## 4  0  0  0
## 5  0  0  0
## 6  1  1  1

Data

Load site-level covariate data:

occ_covs <- read.csv('data/psi_cov3wk.csv')

Inspect site-level covariates:

head(occ_covs)
##   Dist_5km HDens_5km  Latitude  Longitude People_site Trail
## 1     0.04  9.345258 0.3899441 -0.7723958       0.857     1
## 2     0.03  9.499197 0.3899250 -0.7723920       0.002     0
## 3     0.03  9.647173 0.3899111 -0.7723954       0.387     1
## 4     0.03  9.598066 0.3899166 -0.7723972       0.003     0
## 5     0.03  9.607825 0.3899179 -0.7724004       0.000     0
## 6     0.03  9.748791 0.3899058 -0.7724046       0.443     1

Data

Site-level covariate descriptions:

Data

Load detection covariates:

det_covs <- read.csv('data/detection data.csv')

Inspect detection covariates:

head(det_covs)
##        Temp1      Temp2      Temp3
## 1  0.8685909 0.03711437 -0.1788084
## 2  0.8497885 0.92470119  1.3023852
## 3  0.8556828 0.92044332  0.1927059
## 4  0.8555556 0.92032423  0.1975017
## 5  0.8554921 0.92025869  0.1974762
## 6 -0.2674792 0.38300270  0.1003521

Loading the unmarked package

Load the unmarked package

# devtools::install_github("rbchan/unmarked") this will take a while
library(unmarked)

Formatting Data

Formatting

Place detection / non-detection data into a named list.

y_list <- list(bobcat = as.matrix(bob),
               coyote = as.matrix(coy),
               redfox = as.matrix(fox))

Place detection covariates into a named list.

det_list <- list(temp = det_covs)

Combine data into an unmarkedFrameOccuMulti object:

msom_data <- unmarkedFrameOccuMulti(y = y_list,
                                    siteCovs = occ_covs,
                                    obsCovs = det_list)

Intercept-only model, assuming independence

Syntax similar to other models in unmarked

Fit a model

For now, we assume independence among species. We do this by only allowing \(1^{st}\) order natural parameters (maxOrder = 1).

fit_1 <- occuMulti(detformulas = c('~1', '~1', '~1'),
                   stateformulas = c('~1', '~1', '~1'),
                   maxOrder = 1,
                   data = msom_data)

Intercept-only model, assuming independence

This is equivalent to fitting 3 single-species occupancy models

summary(fit_1)
## 
## Call:
## occuMulti(detformulas = c("~1", "~1", "~1"), stateformulas = c("~1", 
##     "~1", "~1"), data = msom_data, maxOrder = 1)
## 
## Occupancy (logit-scale):
##                      Estimate     SE      z  P(>|z|)
## [bobcat] (Intercept)   -1.171 0.1328  -8.82 1.13e-18
## [coyote] (Intercept)   -0.629 0.0744  -8.46 2.79e-17
## [redfox] (Intercept)   -1.846 0.0944 -19.56 3.58e-85
## 
## Detection (logit-scale):
##                      Estimate     SE     z  P(>|z|)
## [bobcat] (Intercept)   -1.104 0.1396 -7.91 2.59e-15
## [coyote] (Intercept)   -0.333 0.0764 -4.36 1.30e-05
## [redfox] (Intercept)   -0.252 0.1182 -2.13 3.29e-02
## 
## AIC: 6710.658 
## Number of sites: 1437
## optim convergence code: 0
## optim iterations: 50 
## Bootstrap iterations: 0

Intercept-only model, assuming dependence

fit_2 <- occuMulti(detformulas = c('~1', '~1', '~1'),
                   stateformulas = c('~1', '~1', '~1',
                                     '~1', '~1', '~1'),
                   maxOrder = 2,
                   data = msom_data)

Intercept-only model, assuming dependence

Inspecting the \(2^{nd}\) order natural parameters from the fitted model permits us to evaluate interspecific dependence.

summary(fit_2)
## 
## Call:
## occuMulti(detformulas = c("~1", "~1", "~1"), stateformulas = c("~1", 
##     "~1", "~1", "~1", "~1", "~1"), data = msom_data, maxOrder = 2)
## 
## Occupancy (logit-scale):
##                             Estimate    SE      z  P(>|z|)
## [bobcat] (Intercept)           -1.76 0.179  -9.81 1.03e-22
## [coyote] (Intercept)           -1.30 0.137  -9.54 1.37e-21
## [redfox] (Intercept)           -2.20 0.152 -14.44 2.81e-47
## [bobcat:coyote] (Intercept)     1.72 0.262   6.56 5.48e-11
## [bobcat:redfox] (Intercept)    -1.38 0.377  -3.66 2.57e-04
## [coyote:redfox] (Intercept)     1.41 0.248   5.69 1.31e-08
## 
## Detection (logit-scale):
##                      Estimate     SE     z  P(>|z|)
## [bobcat] (Intercept)   -1.106 0.1398 -7.91 2.59e-15
## [coyote] (Intercept)   -0.331 0.0761 -4.35 1.38e-05
## [redfox] (Intercept)   -0.253 0.1183 -2.13 3.28e-02
## 
## AIC: 6626.111 
## Number of sites: 1437
## optim convergence code: 0
## optim iterations: 48 
## Bootstrap iterations: 0

Incorporating covariates

The model below is driven less by biology and more by an interest in demonstrating that each parameter can be modeled uniquely.

fit_3 <- occuMulti(detformulas = c('~temp', '~1', '~1'),
                   stateformulas = c('~Dist_5km', '~HDens_5km', '~People_site',
                                     '~Latitude', '~1', '~1'),
                   maxOrder = 2,
                   data = msom_data)

Incorporating covariates

Interpreting output

summary(fit_3)
## 
## Call:
## occuMulti(detformulas = c("~temp", "~1", "~1"), stateformulas = c("~Dist_5km", 
##     "~HDens_5km", "~People_site", "~Latitude", "~1", "~1"), data = msom_data, 
##     maxOrder = 2)
## 
## Occupancy (logit-scale):
##                              Estimate      SE      z  P(>|z|)
## [bobcat] (Intercept)         -1.35970 0.19131  -7.11 1.18e-12
## [bobcat] Dist_5km           -22.95531 5.90062  -3.89 1.00e-04
## [coyote] (Intercept)         -1.48520 0.17195  -8.64 5.74e-18
## [coyote] HDens_5km            0.00424 0.00232   1.83 6.74e-02
## [redfox] (Intercept)         -2.45677 0.16226 -15.14 8.71e-52
## [redfox] People_site          5.92772 1.27022   4.67 3.06e-06
## [bobcat:coyote] (Intercept)   8.89548 2.78776   3.19 1.42e-03
## [bobcat:coyote] Latitude    -18.97578 7.30137  -2.60 9.35e-03
## [bobcat:redfox] (Intercept)  -1.55839 0.42126  -3.70 2.16e-04
## [coyote:redfox] (Intercept)   1.55320 0.26507   5.86 4.64e-09
## 
## Detection (logit-scale):
##                      Estimate     SE     z  P(>|z|)
## [bobcat] (Intercept)   -1.248 0.1439 -8.67 4.16e-18
## [bobcat] temp          -0.306 0.0848 -3.60 3.12e-04
## [coyote] (Intercept)   -0.322 0.0756 -4.26 2.05e-05
## [redfox] (Intercept)   -0.494 0.1295 -3.81 1.36e-04
## 
## AIC: 6544.769 
## Number of sites: 1437
## optim convergence code: 0
## optim iterations: 133 
## Bootstrap iterations: 0

Sample interpretations:

Conditional occupancy probability

Calculation of conditional and marginal occupancy probabilities is done with the predict function.

nd_cond <- data.frame(
  Dist_5km = rep(mean(occ_covs$Dist_5km), 100),
  HDens_5km = rep(mean(occ_covs$HDens_5km), 100),
  People_site = rep(mean(occ_covs$People_site), 100),
  Latitude = seq(min(occ_covs$Latitude), max(occ_covs$Latitude),
                 length.out = 100)
)

Conditional occupancy probability

Predicting bobcat occurrence when coyotes are present

bob_coy_1 <- predict(fit_3, type = 'state', species = 'bobcat',
                     cond = 'coyote', newdata = nd_cond)
## Bootstrapping confidence intervals with100samples

Predicting bobcat occurrence when coyotes are absent

bob_coy_0 <- predict(fit_3, type = 'state', species = 'bobcat',
                     cond = '-coyote', newdata = nd_cond)
## Bootstrapping confidence intervals with100samples

Conditional occupancy probability

Formatting data for plotting in ggplot. I skip details, though this procedure is standard for plotting with ggplot.

gg_df_cond <- data.frame(
  latitude = rep(nd_cond$Latitude, 2),
  occupancy = c(bob_coy_1$Predicted,
                bob_coy_0$Predicted),
  low = c(bob_coy_1$lower,
          bob_coy_0$lower),
  high = c(bob_coy_1$upper,
           bob_coy_0$upper),
  conditional = rep(c('Coyote present', 'Coyote absent'),
                    each = 100)
)

Creating a line plot, with ribbons representing limits of 95% confidence intervals. This is standard ggplot2 code.

library(ggplot2)

cond_fig <- ggplot(gg_df_cond, aes(x = latitude, y = occupancy,
                                   group = conditional)) +
  geom_ribbon(aes(ymin = low, ymax = high, fill = conditional)) +
  geom_line() +
  ylab('Conditional bobcat\noccupancy probability') +
  xlab('Latitude') +
  labs(fill = 'Coyote state') +
  theme(text = element_text(size = 25),
        legend.position = c(0.75, 0.85))

Conditional occupancy probability

cond_fig

Notice that when coyotes are present, there is a negative correlation with occupancy and latitude.

However, notice there is no relationship when coyotes are absent. This is consistent with our model for the bobcat \(1^{st}\) order natural parameter, which is conditional on absence of coyote (and all other species).

Marginal occupancy probability

nd_marg <- data.frame(
  Dist_5km = seq(min(occ_covs$Dist_5km), max(occ_covs$Dist_5km),
                 length.out = 100),
  HDens_5km = rep(mean(occ_covs$HDens_5km), 100),
  People_site = rep(mean(occ_covs$People_site), 100),
  Latitude = rep(mean(occ_covs$Latitude), 100)
)

Marginal occupancy probability

bob_marg <- predict(fit_3, type = 'state', species = 'bobcat',
                    newdata = nd_marg)
## Bootstrapping confidence intervals with100samples

Marginal occupancy probability

Formatting data for ggplot2

gg_df_marg <- data.frame(
  hd = nd_marg$Dist_5km,
  occupancy = bob_marg$Predicted,
  low = bob_marg$lower,
  high = bob_marg$upper
)
marg_fig <- ggplot(gg_df_marg, aes(x = hd, y = occupancy)) +
  geom_ribbon(aes(ymin = low, ymax = high), alpha = 0.5) +
  geom_line() +
  ylab('Marginal bobcat\noccupancy probability') +
  xlab('Housing density') +
  theme(text = element_text(size = 25))

Marginal occupancy probability

marg_fig

Model checking

Breaking the model

A common mistake I see is for students to fit the biggest possible model right out of the gate. Let’s see what this might look like:

fit_4 <- occuMulti(detformulas = c('~temp', '~temp', '~temp'),
                   stateformulas = c('~Dist_5km', '~HDens_5km', '~People_site',
                                     '~Latitude + Longitude',
                                     '~Latitude + Longitude',
                                     '~Latitude + Longitude'),
                   maxOrder = 2,
                   data = msom_data,
                   control = list(maxit = 1000))

Breaking the model

summary(fit_4)
## 
## Call:
## occuMulti(detformulas = c("~temp", "~temp", "~temp"), stateformulas = c("~Dist_5km", 
##     "~HDens_5km", "~People_site", "~Latitude + Longitude", "~Latitude + Longitude", 
##     "~Latitude + Longitude"), data = msom_data, maxOrder = 2, 
##     control = list(maxit = 1000))
## Warning in sqrt(diag(vcov(obj, fixedOnly = fixedOnly))): NaNs produced
## Occupancy (logit-scale):
##                              Estimate      SE       z   P(>|z|)
## [bobcat] (Intercept)        -1.70e+00  0.2608  -6.527  6.69e-11
## [bobcat] Dist_5km           -2.98e+01  6.0387  -4.936  7.96e-07
## [coyote] (Intercept)        -1.16e+00  0.1459  -7.982  1.44e-15
## [coyote] HDens_5km          -2.02e-03  0.0024  -0.844  3.99e-01
## [redfox] (Intercept)        -3.25e+01     NaN     NaN       NaN
## [redfox] People_site         2.22e+00  0.4049   5.491  3.99e-08
## [bobcat:coyote] (Intercept) -5.14e+01  9.6911  -5.302  1.15e-07
## [bobcat:coyote] Latitude     3.80e+01 12.6618   3.003  2.68e-03
## [bobcat:coyote] Longitude   -4.92e+01  7.5292  -6.539  6.18e-11
## [bobcat:redfox] (Intercept) -1.15e+02  5.3643 -21.419 8.82e-102
## [bobcat:redfox] Latitude    -7.35e+01 26.5591  -2.766  5.68e-03
## [bobcat:redfox] Longitude   -1.82e+02 16.6005 -10.954  6.35e-28
## [coyote:redfox] (Intercept)  1.85e+02     NaN     NaN       NaN
## [coyote:redfox] Latitude     4.45e+01 23.9976   1.852  6.40e-02
## [coyote:redfox] Longitude    2.18e+02 20.0605  10.845  2.10e-27
## 
## Detection (logit-scale):
##                      Estimate     SE       z  P(>|z|)
## [bobcat] (Intercept)  -1.2307 0.1214 -10.135 3.88e-24
## [bobcat] temp         -0.3107 0.0828  -3.752 1.76e-04
## [coyote] (Intercept)  -0.7393 0.0687 -10.754 5.68e-27
## [coyote] temp         -0.0983 0.0577  -1.704 8.84e-02
## [redfox] (Intercept)  -0.1775 0.1072  -1.656 9.77e-02
## [redfox] temp         -0.0877 0.1043  -0.841 4.01e-01
## 
## AIC: 6311.066 
## Number of sites: 1437
## optim convergence code: 0
## optim iterations: 271 
## Bootstrap iterations: 0

Notice the unreasonably large coefficient estimates and standard errors.

Penalized likelihood

Penalized likelihood

# this is slow to run
opt_pen <- optimizePenalty(fit_4,
                            penalties = 2 ^ seq(-5, 5),
                            boot = 100)

The above function will take a while to run, and the resulting object is too big to store on GitHub. You can download the .rds file here:

Download opt_pen.rds

Penalized likelihood

Execute this code if you downloaded opt-pen.rds

opt_pen <- readRDS('data/opt-pen.rds')
summary(opt_pen)
## 
## Call:
## occuMulti(detformulas = c("~temp", "~temp", "~temp"), stateformulas = c("~Dist_5km", 
## "~HDens_5km", "~People_site", "~Latitude + Longitude", "~Latitude + Longitude", 
## "~Latitude + Longitude"), data = object@data, maxOrder = 2, penalty = 0.03125, 
##     boot = boot)
## 
## Occupancy (logit-scale):
##                              Estimate     SE     z  P(>|z|)
## [bobcat] (Intercept)         -1.51740 0.2511 -6.04 1.52e-09
## [bobcat] Dist_5km           -14.42998 2.9613 -4.87 1.10e-06
## [coyote] (Intercept)         -1.57083 0.3716 -4.23 2.36e-05
## [coyote] HDens_5km            0.00511 0.0027  1.89 5.81e-02
## [redfox] (Intercept)         -2.86410 0.9128 -3.14 1.70e-03
## [redfox] People_site          4.16768 1.8194  2.29 2.20e-02
## [bobcat:coyote] (Intercept)  -9.78902 1.5066 -6.50 8.17e-11
## [bobcat:coyote] Latitude     -3.90075 2.9684 -1.31 1.89e-01
## [bobcat:coyote] Longitude   -16.81603 2.3686 -7.10 1.25e-12
## [bobcat:redfox] (Intercept)  -0.88576 0.5262 -1.68 9.23e-02
## [bobcat:redfox] Latitude      2.72790 1.6105  1.69 9.03e-02
## [bobcat:redfox] Longitude     2.56619 1.2868  1.99 4.61e-02
## [coyote:redfox] (Intercept)   8.44331 2.1693  3.89 9.94e-05
## [coyote:redfox] Latitude     19.98990 4.7800  4.18 2.89e-05
## [coyote:redfox] Longitude    17.28457 3.7233  4.64 3.45e-06
## 
## Detection (logit-scale):
##                      Estimate     SE      z  P(>|z|)
## [bobcat] (Intercept)  -1.3558 0.1858 -7.299 2.90e-13
## [bobcat] temp         -0.3275 0.0848 -3.862 1.13e-04
## [coyote] (Intercept)  -0.4880 0.1462 -3.338 8.43e-04
## [coyote] temp         -0.0755 0.0789 -0.956 3.39e-01
## [redfox] (Intercept)  -0.3432 0.1670 -2.055 3.99e-02
## [redfox] temp          0.0853 0.1619  0.527 5.98e-01
## 
## AIC: 6509.693 
## Number of sites: 1437
## optim convergence code: 0
## optim iterations: 104 
## Bootstrap iterations: 100

Notice the reasonable coefficients and standard errors.

Continuous-time MSOM

Questions?